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ABSTRACT 


This paper presents flower and leaf recognition for plant identification using 
Convolutional Neural Network (CNN). In this study, the performance of 
CNN for plant identification using images of the leaves, flowers and a 
combination of both are investigated. Two publicly available datasets, 
namely Folio leaf dataset and Flower Recognition dataset, have been used for 


the training and testing purposes. CNN has been proven to produce excellent 
results for object recognition but its performance can still be influenced by 
Keywords: the type of images and the number of layers of the CNN architecture. 
Experimental results indicate that the utilization of leaf images only arrive to 
CNN : oe ete 
: the highest accuracy for plant identification compared to the images of 
Deep learning = flowers only or the combination of both, that are 98%, 85% and 
Flower recognition 74%, respectively. 
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1, INTRODUCTION 

Deep learning is a machine learning technique that teaches computers to do what comes naturally to 
humans: learn by example [1]. In deep learning, a computer model learns to perform classification tasks 
directly from images, text, or sound. Within the past few years, deep learning algorithms particularly 
Convolutional Neural Networks (CNNs) have proven their much powerful feature representation capabilities 
in computer vision [2]. Data are trained by using a large set of labeled data with various numbers of layers 
of the CNN. 

Current advances in hardware technology have enabled the evolution of CNN and massive number 
of their applications, as well as complicated tasks like objects recognition and image classification [3]. It has 
resulted in ground breaking decisions over the last decade in various fields related to pattern recognition; 
from image processing to voice recognition [4]. CNN’s capabilities have become a known and used in 
various object recognition problems such as flower categorization [5], leaf recognition [6], voice analysis [7], 
image classification [8], fruit classification and ripeness grading recognition [9], food recognition [10], 
and plant disease identification [11]. 

CNN 1s a sub-class of Artificial Neural Network (ANN), an information processing paradigm that is 
inspired by the way the biological nervous system works, that is how the the brain process information. 
The brain consists of a large number of neurons interacting together to solve certain problems [3, 12]. ANN 
is a math representation of the human nervous architecture that belongs to artificial intelligence [13]. It 1s still 
unclear whether the ANN can modify and adapt itself to explain and adapt itself to overcome various other 
tasks as opposed to CNN that has proven to be effective in the field of major tasks [14]. 

As in other image recognition tasks, plant identification depends on computational strategies to 
extract discriminative options from pictures. Options are historically hand-loomed or hand designed. 
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However, a recent trend in machine learning has shown that learned representations are more practical and 
economical [15]. Several parts of a plant can be used by a botanist in order to recognize a plant and various 
efforts have been done that includes flowers, leaves, and roots [16]. Leaves are the most widely used as it is 
more convenient to be examined. Besides that, it provides important diagnostic characters and discriminative 
features for plant identification and the results are great [17-19]. However, some researches have also been 
made on flower recognition to identify the species of plants. Even though flowers have many different 
species, some of them have very similar characteristics and looks. This similarity and dissimilarity make the 
flowers recognition process with a highly accurate result is very challenging [20]. 

The purpose of identifying plants is to categorize the plants for recording purposes. The process of 
identifying a plant using flowers and leaves are an easy task for botanists as they can simply recognize it 
using their knowledge [21]. On the contrary, for machines to achieve the same recognition results requires 
performing image-processing techniques to extract visual information and compare them to existing sets of 
data [7]. Structured learning or better known as deep learning, has been recognized as a new area in computer 
vision that has been reported to produce excellent results [22]. 

Research about flower and leaf recognition has been conducted by several researchers using various 
techniques. Hu’s seven moment algorithm has been applied for flower recognition with almost 80% accuracy 
[23]. With data augmentation, the accuracy of 99.04% using AlexNet and 99.42 using GoogleNet have been 
obtained [24]. Support Vector Machine (SVM) with texture features has achieved 99% accuracy [20]. 
Besides that, shape features and colour histogram with k-nearest neighbour classifiers have been applied with 
87.2% accuracy [6]. Since the results of using Folio leaf dataset and Flower dataset were very positive, these 
datasets were chosen to be experimented in this research. 

In this paper, a comparison of the accuracy performance between a set of flowers, a set of leaves and 
a combination of a set of flowers and leaves have been conducted in order to analyze the accuracy 
performance of plant identification. This paper is organized as follows. The nest section discusses about 
CNN and the features that have been utilized for plant identification, followed by results analysis of the 
experiments using different datasets and different number of layers on CNN. The last section concludes this 
paper with the information of future work. 


2. RESEARCH METHOD 
2.1. Convolutional Neural Network (CNN) 

CNN consists of four types of layers which are convolution layers, pooling layer, Rectified Linear 
unit (ReLu) layer and fully connected layers. Convolution layers extract the input of an image by using 
convolution operation and produce a feature map [19]. Multiple convolutional layers can also be applied for 
different feature maps. This method is to ensure complete extraction of various features. Next, pooling layer 
lower the size of the feature maps. This process makes the input robust against noise and distortion [2]. CNN 
particularly relies on the third layer which is the activation function. CNN may use specific functions such as 
ReLUs functions to efficiently implement non-liner triggering. All negative pixel values in the feature map 
are replaced with zero in the ReLu layer [6]. Fully connected layer which is the last layer, total the weightage 
of previous layers of features to determine the output. 

Figure | shows the CNN architecture that extracts features by using convolution technique on the 
input image, resizes the feature map during pooling layer and classifies it in the fully connected layer. 
The first convolution layer usually extracts the low-level features such as edges while the second convolution 
layer extracts the high-level features such as the shape. 

Explaining research chronological, including research design, research procedure (in the form of 
algorithms, Pseudocode or other), how to test and data acquisition [6-9]. The description of the course of 
research should be supported references, so the explanation can be accepted scientifically [4, 10]. 

Tables and Figures are presented center, as shown in Table | and Figure | and cited in the 
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Figure 1. Convolutional neural network architecture [25] 
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2.2. Flower and Leaf Recognition 
When it comes to quantifying flower and leaf images, the three most important attributes to be 
considered are color, texture and shape. 


2.2.1 Color 

One of the important features to recognize the flower is through the “Color”. Color Histogram 
which calculates the frequency of pixel intensities occurring in an image is the most genuine and simple 
global feature descriptors [5]. This enables the descriptor to learn about the distribution of each color in an 
image. The feature vector is taken by combining the count for each color. However, the color features of an 
image only are not enough to quantify flowers, because in a multi-species environment, two or more species 
could be of the same color [5]. For example, daisy and magnolia have similar color but they are different 
flowers, and of course, different plants. 


2.2.2 Texture 

Texture is another important feature that can be used to recognize the species of plants. The Gray 
Level Co - occurrence Matrices GLCM is one of the most important and earliest texture analysis approaches 
introduced by (Haralick) in 1973 [24]. According to S. Albawi there are 14 statistical features that can be 
calculated to quantify an image based on texture and this requires a large amount of input size [4]. 
The resulting feature vector is a 13-dimensional feature vector that ignores the 14th dimension due to high 
computational time [5]. 


2.2.3 Shape 

Another important feature to quantify the image is through the “shape”. Zernike moments, 
introduced by Teague as a shape descriptor, and Hu moments are two widely used global form descriptors in 
computer vision research that can represent the shape of an object [5]. Moments depend on the statistical 
expectations of a random variable. There are seven moments, which are called Hu moments. The 
combination of these seven moments form 7-dimensional feature vector [5]. But there are flowers that have 
similar shape such as peonies and hydrangea 


3. RESULTS AND ANALYSIS 

The laptop used to run the CNN for this project was Asus with Windows 10, Intel Core 15 processor, 
4.00 GB RAM and the operating system is 64-bit while the software used is Matlab 2018a. For this project, 
three CNN tests have been undertaken using two datasets. The datasests used for three experiments are 
Flower Recognition dataset, Folio Leaf dataset and a combination of both datasets. 

All of the images were resized to 224 by 224 pixels to ensure the consistency of the data for each 
experiment. Experiments were conducted by changing the number of layers, the values of the parameters in 
the convolve layer, pooling layer and the learning rate. The purpose is to determine the best combination of 
parameters to produce the highest accuracy for plant recognition from all three datasets. 

The results of the experiments were recorded in Table |. By referring to Table 1, we can see that the 
first column indicates the number of stacks of layers where a stack consists of one convolve layer, one max- 
pooling layer and one ReLu layer. In column Convolve layer, the first number in the square bracket 
represents the size of the convolve filter while the second number represents the number of convolve filters. 
The third layer represents the size of max-pooling filter and the number of stride. The number of epoch and 
learning rate is shown in column 4. The number of epoch determines the number of repetitions of all the 
training data while the learning rate is the amount of adjustment that is being made to the weights during the 
training process. The adjustments of the weights are performed until the error rate is minimal where it 
represents the learning process of CNN. 


3.1. Flower Recognition Dataset 

Flower Recognition dataset comprises of 4242 floral images where data collection for all the floral 
images are scraped from google images, flicr data, and yandex images [25]. However, for the purpose of this 
project, only 100 images have been randomly chosen from 5 categories which are daisy, dandelion, rose, 
sunflower and tulip. Figure 2 shows some example images from this dataset. 
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Figure 2. Flower recognition dataset [26] 
Table 1 shows the accuracy achieved from CNN testing on Flower Recognition dataset where 
different stacks of layers have been experimented to examine the effects of the number of layers to the 
accuracy performance. The highest accuracy was obtained with four stacks of layers which is 74%. 














Table 1. Experimental Results on Parameter Tuning for Basic CNN based on Flower Recognition Dataset 


Pooling 
moras Convolve Layer layer and Epo eee Accuracy (%) ‘Total Time/s 
of Layers Rate 
Stride 
1 [3,64] 3 10, 0.001 68 3 min 18s 
[3,16] 2 10, 0.001 62 3 min 14s 
3 [3,16], [3,48] 3 10, 0.001 66 6 min 55s 
[3,64], [3,80] 3 10, 0.001 70 O min 34s 
3 [3,64], [3,64], [3, 128] 3 10, 0.001 68 3 min 44s 
[5,20], [3, 20], [3, 16] 3 10, 0.001 64 3 min 35s 
4 [3,64], [3,64], [3, 80], [3, 80] 3 10, 0.001 64 4 min 35s 
[3,80], [3,80], [3, 256], [3, 256] 3 10, 0.001 74 4 min 35s 


3.2. Folio Leaf Dataset 

The second dataset used is Folio Leaf DataSet [26]. Leaves pictures are taken from plants on the 
farm of the University of Mauritius and nearby locations. There are 32 categories of plant and for each 
category 20 images of leaves are experimented. However, for this project, only 5 categories and 20 images 
for each category are used. Figure 3 shows some sample images from this dataset. 
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Figure 3. Folio leaf dataset [27] 





Table 2 shows the different accuracies achieved from CNN testing on this dataset with different 
stacks of layers. By looking at Table 2, we can see that the highest accuracy was produced with 4 stacks of 
layers that is 98%. 


Table 2. Experimental Results on Parameter Tuning for Basic CNN based on Folio Leaf Dataset 


Sey Convolve Layer oye ea tare i Accuracy (%) Total Time/s 
1 [3,16] 3 10, 0.001 71.92 2 min 32s 
[5,20] 2 10, 0.0001 65.62 3 min 23s 
) [3,16], [3,16] 3 10, 0.001 79.81 4 min 27s 
[3,80], [3,64] 2 10, 0.001 73.82 2 min 40s 
[3,16], [3,16], [3, 32] 3 10, 0.001 76.66 3 min 44s 
[5,20], [3, 20], [3, 16] 3 10, 0.001 82.03 3 min 35s 
4 [3,64], [3,64], [3, 80], [3, 80] 3 10, 0.001 98 3 min 20s 
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3.3. Folio and Leaf Dataset 

In this experiment, a combination of Flower Recognition dataset and Folio Leaf Dataset has been 
tested. This dataset consists of 10 categories in which 5 categories from the Flower Recognition dataset and 5 
categories from the Folio Leaf dataset with a total of 200 images. Table 3 illustrates the accuracies achieved 
from CNN testing on these datasets where 85% accuracy has been acquired with four stacks of layers. 


Table 3. Experimental Results on Parameter Tuning for basic CNN using Flower Recognition 
and Folio Datasets 
No of Stack Pooling layer Epoch, Accuracy 


of Layers oer aes and Stride Learning Rate (%) pone 
1 [3,48] 3 10, 0.001 74 5 min 15s 
[3,80] 3 10, 0.001 72 5 min 12s 
) [3,64], [3,80] 3 10, 0.001 76 6 min 55s 
[3,64], [3,128] 2 10, 0.001 60 7 min 25s 
[3,64], [3,64], [3, 128] 5 10, 0.001 66 9 min 56s 
[5,20], [3, 64], [3, 80] 3 10, 0.001 fe: 3 min 46s 
4 [3,64], [3,64], [3, 80], [3, 80] 3 10, 0.001 85 7 min 26s 


3.4. Overview of All Datasets 

Table 4 provides an overview of the highest accuracy performance produced by CNN conducted on 
all three datasets. By looking at Table 4, we can see that the best accuracy has been achieved by utilizing 
Folio Leaf dataset which 1s 98%. This shows that not all datasets will produce good results when tested with 
CNN. This has been proven with the Flower Recognition dataset which only achieved 74% accuracy even 
when tested using the same specifications with other dataset in which is made to ensure the consistency of 
this experiment. 


Table 4. The Performance Overview for Analysis of CNN for Flower and Leaf Recognition 


Dataset No of Stack Pooling layer Epoch, Accuracy 
of Layers Ou e aye and Stride Learning Rate (%) polar mes 
se 4 60h 1.80) [2 200) |e: 3 10, 0.001 14 4 min 35s 
Recognition 256] 
Folio Leaf 4 [3,64], [3,64], [3, 80], [3, 80] 3 10, 0.001 98 3 min 20s 
Flower and Leaf 4 [3,64], [3,64], [3, 80], [3, 80] 3) 10, 0.001 85 7 min 26s 


By referring to Table 4, we can see that plant identification based on the images of the leaves are 
more accurate compared to by using the images of the flowers or a combination of both. The accuracy of 
flower recognition only reaches 64% when using the same architecture as stated in Table 4 for leaf 
recognition and the combination of both 


4. CONCLUSION 

This paper evaluates the performance of CNN for plant identification using the images of the leaves 
only, flowers only and a combination of both images. The experimental results show that the use of the 
images of the leaves only arrive to the best accuracy. This may be due to the fact that flowers have too many 
variations in terms of shapes and colours compared to the leaves. For future research, we plan to investigate 
other variations of the CNN architecture and compare with other plant datasets for more training data. 
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